# POWER ESTIMATION SCHEME FOR LOW POWER ORIENTED BIOMEDICAL SOC EXTENDED TO VERY DEEP SUBMICRON TECHNOLOGY

Hong-Hui Chen, Tung-Chien Chen, Cheng-Yi Chiang, Liang-Gee Chen

sernc, djchen, chiang831, lgchen@video.ee.ntu.edu.tw Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan

## ABSTRACT

This paper introduces a power estimation scheme and generated results of SoC (System-on-Chip) fabricated with different process nodes extending to very deep submicron technology. Different power modeling strategies are used to estimate power for analog and digital circuits. According to the analysis results, ultra low power analog components are key to successful biomedical SoC design if more advanced fabrication technology is utilized. Meanwhile, the digital part should be designed barely enough to serve the target application. Integrating more dedicated digital hardware accelerators can further reduce the total power consumption by lowering the working frequency of system processor. The goal of this paper is to provide a quantitative scheme to estimate the power consumption when SoC is fabricated with different process technologies. Then a suitable technology could be selected to manufacture the SoC for biomedical usage.

*Index Terms*— Biomedical, SoC, power estimation, very deep submicron, ECG, EEG, ECoG

# 1. INTRODUCTION

Applying electronic devices for health care purpose has drawn great interests in recent years. On the one hand, the average age of worldwide population is getting old and the industry is motivated to invent for the future. On the other hand the mature process of fabricating semiconductor integrated circuit(IC) makes it possible to provide a cost-effective product for biomedical applications. Large-scaled devices, like X-ray imaging, ultrasonic imaging, and magnetic resonance imaging (MRI) have proven their success in diagnosing and treating human diseases. Recently, more personalized devices like sphygmomanometer, blood glucose meter, and ear thermometer have successfully been used as the point-of-care (POC) means for a person or a family. It is promising that existing devices will still prevail and some other devices may be enhanced, integrated or invented to provide a better health care quality.

| Table 1. Power model groups |                           |  |  |
|-----------------------------|---------------------------|--|--|
| Baseline part               | DSP part                  |  |  |
| ·amplifier                  | ·processor                |  |  |
| ·ADC                        | ·internal SRAM            |  |  |
| ·stimulator                 | ·other dedicated hard-    |  |  |
| ·wireless (transmitter, re- | ware (ex: digital filter, |  |  |
| ceiver)                     | FFT)                      |  |  |

SoC development for biomedical application is quite challenging. Among all the challenges, power consumption is one of the most significant aspects to be carefully explored in the progress of implementation. Analog circuits are the front end to record signals from human body. Data acquisition is normally done by first-stage amplifier whose input impedance is so high as to capture the tiny voltage potential from measuring points of human body. Some systems transferred the measured samples to a computer for further processing after applying an analog-to-digital conversion (ADC) process. For mobile application, it is desirable to acquire the signal on-the-spot and do signal processing in embedded way. To process the signal digitally is an emerging trend in electronic biomedical devices. In order to conduct system power analysis for this kind of systems, we divide the power models into two groups as shown in Table 1. Two different strategies are used to deduce the power models. The first one is clustering analysis based on existing literatures and is mostly used to derive the model for baseline group. The second method is curve fitting based on digital cell libraries and memory compilers for various process technologies covering 350nm to 65nm. Resorting to the fitting result, the power model of very deep submicron technology, like 28nm, is estimated. Based on the estimation results of the two power model groups, the power specifications of the SoC fabricated with 350nm, 90m, and 28nm technologies are calculated. Estimation results are also compared with the taped-out 350nm SoC as a case study. The reminder of this paper is organized as follows. Preliminary background is first provided followed by power model elaboration steps. Cases based on different process technologies are then described and insights of the power estimation results are discussed. Finally the conclusion is given.

## 2. POWER MODELS

Based on the survey of literatures [1, 2, 3] etc. and also the experiences gained from the analog designers, the power consumption of the analog design is loosely correlated to fabrication process technology. [1] is the best example to reflect this phenomenon where  $0.8\mu m$  process technology is used to achieve an ultra low power amplifier design. Thus to predict analog components' power is not similar to the strongly process dependent digital components which will be presented later. Therefore we propose a statistic-based scheme SCCI (Sort-Cluster-Correlation-Interpolation) to solve this dilemma. Steps in SCCI are explained in subsection 2.1. The second power group is the DSP group, where currently two major components, the RISC processor and internal SRAM, are analyzed for their power portraits. Power for components belong to DSP group demonstrates a significant dependency on the fabrication technology.

## 2.1. Design components - Baseline group

SCCI scheme is used to build the power model for baseline components. The procedure of the scheme is described as follows:

| Table 2. | Procedure | of SCCI | scheme |
|----------|-----------|---------|--------|
|          |           |         |        |

| For each component in baseline group                  |
|-------------------------------------------------------|
| 1) Sort the normalized power coefficients             |
| 2) Cluster the sorting results into 3 subgroups us-   |
| ing K-means algorithm and record the center of        |
| gravity of each subgroup                              |
| 3) If there is no previous clustering result, goto to |
| step 6                                                |
| 4) Correlation coefficient calculated with the 2      |
| center of gravity sets of current and previous clus-  |
| tering result.                                        |
| 5) If correlation coefficient > predefined converg-   |
| ing threshold, break the loop and take the final      |
| center of gravity set as output                       |
| 6) Interpolate the sorting result, go to 2)           |

The input data set to SCCI is first normalized with the following factors: voltage (square value), channel count, amplifier bandwidth, ADC sampling rate and ADC precision bits. The raw data to be normalized are extracted from existing literatures ([3] [4] and more) and commercial products' datasheets, for example: TI's CC2430[5], NORDIC's nRF24LE1[6]. We take 0.98 as the converging threshold for correlation coefficient check where similarity between two center of gravity sets is ensured. This threshold provides a trade-off mechanism between converging loop count and the stability of the clustering result. In our analysis, the SCCI procedure for each baseline components all converges within 3 iteration loops by executing the implemented Matlab codes. The 3 elements in SCCI output gravity set are labeled separately as best case (BC), typical case (TC), and worst case (WC) for analog power consumption estimation. Table 3 lists the analysis result for each baseline component and the unit for each row is formed according to the normalization process. The coefficients listed in Table 3 will be used in the case analysis and elaboration section to estimate total system power.

Table 3. SCCI analysis result for baseline group

| Table .            | Table 5. Seel analysis result for baseline group |         |         |                                                |  |
|--------------------|--------------------------------------------------|---------|---------|------------------------------------------------|--|
| component          | BC                                               | TC      | WC      | unit                                           |  |
| amplifier          | 5.3342                                           | 40.6234 | 1930.89 | $ \frac{nW/(V^2 \cdot channel \cdot Hz)}{Hz} $ |  |
| ADC                | 0.282                                            | 24.0357 | 38.9821 | $nW/(V^2 \cdot channel \cdot Hz \cdot bit)$    |  |
| stimulator         | 8.24                                             | 15.7302 | 54.387  | $\mu W/(V^2 \cdot channel)$                    |  |
| TX<br>0~2dbm       | 1244.44                                          | 3650.00 | 5527.78 | $\frac{\mu W/(V^2 \cdot channel)}{$            |  |
| RX -100~<br>-90dbm | 1204.44                                          | 3505.00 | 5736.11 | $\mu W/(V^2 \cdot channel)$                    |  |

#### 2.2. Design components - DSP group

In our analysis, DSP group includes the RISC processor and the internal SRAM. The digital cell libraries and SRAM compilers for different UMC process technologies are used for the evaluation flow. As for the digital cell libraries, process nodes including 350nm, 250nm, 180nm, 130m, 90nm and 65nm are used to synthesize the RTL codes of OpenRISC implementation, OR1200[7]. OR1200 is an open source soft IP core available from the website of OPENCORE.org. It is a 32-bit Harvard architecture RISC core which is regarded comparable to the ARM9 processor. We use it to deduce the trend of power when process node migrates. Actually we have taped out a 350nm SoC where OR1200 is adopted as the system controller. The procedures to get power model of these two components are described in 2.2.1 and 2.2.2.

## 2.2.1. Power model of RISC processor

Synthesis tool from EDA vendor, Synopsys, is used to synthesize OR1200 with clock constraint at 0.5, 1, 1.25, 1.67, 2.5, 5, 10, 20, 50, and 100 MHz. We select finer resolution in lower frequency range because our application aims to work at lower frequency range owing to power consideration. The synthesis result is shown in Figure 1. The topmost curve is 350nm process node while the lowest is 65nm. The slope of the curve decreases accordingly when the process node becomes more advanced. First order fitting is done to get the fitting slope. Then for estimating the process node whose power characteristics are not available, a third order (cubic) fitting function is used to do nonlinear regression in Matlab. Higher



Fig. 1. RISC power vs. frequency for different process nodes



Fig. 2. RISC power coefficients vs. process nodes

order fittings generate similar results. With the regression result, the coefficient for 28nm could be estimated. However, the estimated result based on all the available process nodes is too optimistic for 28nm process and also the fitting residues are too large. A more reasonable result is achieved with only the deep submicron nodes from 180nm to 65nm used as fitting inputs. The final fitting results are listed in Table 4 and the fitting curve is depicted by Figure 2.

| process | powe    | er coefficient          |
|---------|---------|-------------------------|
| 180nm   | 29.3595 | $\mu W/(V^2 \cdot MHz)$ |
| 130nm   | 23.6716 | $\mu W/(V^2 \cdot MHz)$ |
| 90nm    | 16.29   | $\mu W/(V^2 \cdot MHz)$ |
| 65nm    | 11.2433 | $\mu W/(V^2 \cdot MHz)$ |
| 28nm    | 4.2203  | $\mu W/(V^2 \cdot MHz)$ |

Table 4. Estimated coefficients for deep submicron

The coefficients in Table 4 are used in later case analysis. Note, there are several process characteristics selectable for deep submicron process, low leakage (LL) process outperforms other process choices, for example standard process (SP), within our fitting range from  $0.5 \sim 100$ MHz working frequency. Therefore the above coefficients are generated based on LL process.

#### 2.2.2. Power model of internal SRAM

SRAM compilers available for UMC foundry cover process nodes from 250nm to 65m. 350nm is absent therefore its power model coefficients are also gained from fitting result. Originally we fit SRAM power with voltage-normalized data set. However, the residues of the fitting result are too large to be accepted. Thus we adopt non-voltage-normalized data set to get the necessary estimation coefficients. Table 5 shows the fitting result for 1-port SRAM for different sizes. Note the unit of the coefficients does not contain a  $V^2$  term.

| SRAM size(bits), unit: $\mu W/MHz$ |          |          |          |          |  |
|------------------------------------|----------|----------|----------|----------|--|
| process                            | 128x32   | 256x32   | 512x32   | 1024x32  |  |
| 350nm                              | 111.4878 | 125.5074 | 146.5372 | 176.8799 |  |
| 250nm                              | 36.0238  | 39.9643  | 46.095   | 56.8743  |  |
| 180nm                              | 14.9297  | 16.2361  | 18.4959  | 23.6662  |  |
| 130nm                              | 9.0995   | 9.8107   | 11.1786  | 14.6322  |  |
| 90nm                               | 7.0272   | 7.6223   | 8.7749   | 11.4492  |  |
| 65nm                               | 5.9446   | 6.4963   | 7.5363   | 9.7386   |  |
| 28nm                               | 3.4441   | 3.8053   | 4.4554   | 5.6637   |  |
|                                    |          |          |          |          |  |

It could be seen that the bigger the SRAM size is the larger the power coefficient becomes. In addition, the power increasing ratio is smaller than the ratio of size increment. Likewise coefficients in Table 5 are used to model SRAM power characteristics for our case analysis in the following section.

#### 3. CASE ANALYSIS AND ELABORATION

There has already been a very successful case proven by Medtronic co.[8] where deep brain stimulation is conducted to treat the Parkinson's disease. The estimated power of the Medtornic device serves as a good reference for state-of-the art power management for electronic biomedical systems. Other implemented and imaginary cases are also analyzed in following subsections.

## **3.1. Deep brain stimulation**

According to the information provided by the surgeons, working period of the deep brain stimulator device is about 5 years. Besides, the stimulation pattern is about  $100 \sim 200$ Hz with a  $60 \sim 150 \mu$ s turn-on period for applying  $3V \sim 5V$  voltage stimulation. The impedance of the human brain is about  $1350\Omega$ . Assume that device has a battery with hundreds of mAh capacity. The power consumption of such device will be around  $200 \sim 300 \mu$ W. This level of power consumption set a required specification for device to be implanted in human body.

## 3.2. Designed and planned SoC chips

We designed our first SoC with 350nm process node for preliminary animal experiments. The chip should be smaller enough to be mounted on the mouse. The SoC specifications are summarized in Table 6. These specifications are used to conduct the power estimation for our planned SoC chip.

Table 6. Targeted SoC specifications

| component  | quantity | specification         |
|------------|----------|-----------------------|
| amplifier  | 16       | bandwidth 200Hz       |
| ADC        | 1        | sampling rate 6.4K    |
| stimulator | 8        | $0{\sim}5V$           |
| RF TX/RX   | 1/1      | -5dbm/-90dbm          |
| processor  | 1        | clock frequency 20MHz |
| SRAM       | 2        | capacity 1024x32 bits |

#### 3.2.1. Previously developed 350nm SoC

The taped-out 350nm chip has similar specifications of Table 6. The power of digital portion of this chip is 1.87 times more than a single RISC processor. This 1.87 overhead ratio is taken into account in power calculation. Assume that the active period of the wireless TX/RX is 1%. The power of estimated 350nm is shown in first column of Table 7 with BC analog components. Based on the simulation result, power consumption of our developed analog components are smaller than the BC ones, so chances that our chip outperforms the estimated one.

# 3.2.2. Imaginary 90 and 28nm SoC

| Table 7. Power | estimated | from | derived | models | (unit: | $\mu W$ ) |
|----------------|-----------|------|---------|--------|--------|-----------|
|----------------|-----------|------|---------|--------|--------|-----------|

| power       | 3V@350nm | 1V@90nm | 0.9V@28nm |
|-------------|----------|---------|-----------|
| baseline(1) | 5875.98  | 652.89  | 528.84    |
| RISC(2)     | 48548.64 | 611.18  | 158.34    |
| (1)+(2)     | 54424.62 | 1264.06 | 687.18    |
| SRAM(3)     | 7075.20  | 457.97  | 226.55    |
| (1)+(2)+(3) | 61499.81 | 1722.03 | 913.73    |

BC analog components are used to construct power estimation in Table 7. Judging from above table, power reduction achieved by moving from 350nm to 90nm is much higher than that from 90nm to 28nm. That is the ratio 35.7 for 350nm to 90nm vs. 1.9 for 90nm to 28nm. Therefore, to fabricate SoC with more advanced technology helps to cut down total power drastically. However, the deeper the process node becomes the less the extra power reduction gained. Furthermore, analog components become power dominant while more advanced technology is adopted for fabrication. Note, even with very deep submicron technology, like the 28nm process node, power consumption of the planned SoC is about 4 times of the estimated figure of Medtronic device (200~300 $\mu$ W). This reflects the challenge for producing a realistic electronic medical product for implantation purpose where the battery should not changed frequently. As for external noninvasive electronic medical devices, the power levels provided by 90nm or 28nm SoC meet the requirement that the device keeps working for several months.

# 4. CONCLUSION

Based on the power estimation outcome, it is important to have best case analog components integrated in SoC since these components dominate the power consumption when fabricating the SoC with more advanced technology. Meanwhile, digital part should keep its working frequency as low as possible while serving the desired application. Never over design and treasure every pieces of the energy from the battery. Besides, if more digital dedicated hardware integrated to accelerate the DSP algorithm, the working frequency of the RISC could be further reduced. The capacity and quantity of internal SRAM should also be carefully chosen to keep the power consumed in memory within a reasonable level. Wisely selecting various design paradigms to conquer the design difficulties is very important for biomedical system design. This work provides a quantitative power estimation scheme by considering both analog and digital circuits. Power consumption can be estimated in advance in order to reduce the risk of going beyond the power budget of the SoC project for biomedical applications.

#### 5. REFERENCES

- Denison, T. et al., "A 2 μW, 95nV/rtHz, chopperstabilized instrumentation amplifier for chronic measurement of bio-potentials," *IEEE Instrumentation and Measurement Technology Conference Proceedings*, 2007.
- [2] Xiaodan Zou et al., "A 1V 22 μW 32-channel implantable EEG recording IC," *IEEE International Solid-State Circuits Conference*, 2010.
- [3] Verma, N. et al., "A micro-power EEG acquisition SoC with integrated feature extraction processor for a chronic seizure detection system," *IEEE Journal of Solid-State Circuits*, 2010.
- [4] Moo Sung Chae et al., "A 128-channel 6 mW wireless neural recording IC with spike feature extraction and UWB transmitter," *IEEE Transactions on Neural Systems* and Rehabilitation Engineering, 2009.
- [5] Texas Instruments, http://www.ti.com/.
- [6] NORDIC semiconductor, http://www.nordicsemi.com/.
- [7] OpenRISC project, http://opencores.org/openrisc,or1200.
- [8] Medtronic corporation, http://www.medtronic.com/.